4 research outputs found

    Attention-Privileged Reinforcement Learning

    Full text link
    Image-based Reinforcement Learning is known to suffer from poor sample efficiency and generalisation to unseen visuals such as distractors (task-independent aspects of the observation space). Visual domain randomisation encourages transfer by training over visual factors of variation that may be encountered in the target domain. This increases learning complexity, can negatively impact learning rate and performance, and requires knowledge of potential variations during deployment. In this paper, we introduce Attention-Privileged Reinforcement Learning (APRiL) which uses a self-supervised attention mechanism to significantly alleviate these drawbacks: by focusing on task-relevant aspects of the observations, attention provides robustness to distractors as well as significantly increased learning efficiency. APRiL trains two attention-augmented actor-critic agents: one purely based on image observations, available across training and transfer domains; and one with access to privileged information (such as environment states) available only during training. Experience is shared between both agents and their attention mechanisms are aligned. The image-based policy can then be deployed without access to privileged information. We experimentally demonstrate accelerated and more robust learning on a diverse set of domains, leading to improved final performance for environments both within and outside the training distribution.Comment: Published at Conference on Robot Learning (CoRL) 202

    TACO: Learning Task Decomposition via Temporal Alignment for Control

    Full text link
    Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, they provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.Comment: 12 Pages. Published at ICML 201

    Discovering knowledge abstractions for sample efficient embodied transfer learning

    No full text
    This thesis concerns sample-efficient embodied machine learning. Machine learning success in sequential decision problems has been limited to domains with a narrow range of goals, requiring orders more experience than humans. Additionally, they lack the ability to generalise to new related goals. In contrast, humans are continual learners. Given their embodiment and computational constraints, humans are forced to reuse knowledge (compressed abstractions of repeated structures present across their lifetime) to tackle novel scenarios in as sample-efficient and safe manner as possible. In robotics, similar traits are desired, given they are also embodied learners. Taking inspiration from humans, the central claim of this thesis is that knowledge abstractions acquired from prior experience can be used to design domain-independent sample-efficient algorithms that improve generalisation across modular domains. We refer to modular domains as Markov decision processes (MDPs) whose optimal policies can be obtained when reasoning and acting occurs over compressed abstractions shared across them. The challenge is how to discover these abstractions with minimal supervision and sample-efficiently. Additionally, for embodied machine learning it is important the approach supports continuous, potentially unbounded, state-action spaces. Adhering to these constraints, we first develop novel self- (Chapter 3) and weakly-supervised (Chapter 4) knowledge abstraction, domain adaptation, methods for zero-shot generalisation to unseen domains. We demonstrate their potential on robotic applications including sim2real transfer (Chapter 3) and generalisation using a human-robot command interface (Chapter 4). We continue by developing novel unsupervised knowledge abstraction, transfer learning, methods for sample-efficient adaptation to unseen domains (Chapters 5 and 6). We highlight their relevance in robotics and continual learning. We introduce a hierarchical KL-regularised RL approach based on novel theory behind the transferability-expressivity trade-off of abstractions (Chapter 5) and develop the first, to our knowledge, bottleneck-options approach adhering to the aforementioned embodied machine learning constraints (Chapter 6)
    corecore